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Test developers are continually exploring the possibilities Computer Based Assessment 
(CBA) offers the Mathematics domain. This paper describes the trial of the Place Value 
Assessment Tool (PVAT) and its online equivalent, the PVAT-O. Both tests were 
administered using a counterbalanced research design to 253 Year 3-6 students across nine 
classes at a primary school in Melbourne. The findings show while both forms are valid and 
comparable, the online mode was preferred by teachers. The affordances and constraints of 
using CBA in the formative assessment process are explored. 


Over the past 10 years there has been a rapid uptake of Mathematics Computer Based 
Assessments (CBA) in Australian primary schools. Commercial firms have identified 
teachers as eager consumers in this market. Companies are acutely aware of the friction 
points for teachers: the challenges around creating their own formative assessments and 
time-consuming marking. This has led to the development of several increasingly popular 
CBA formative assessment “programs”. Yet, for these programs be the panacea their 
advertising suggests, schools must be confident they provide valid formative data teachers 
can easily interpret and apply. 

Currently in Australia, there are very few comprehensive formative whole number place 
value assessments for Years 3-6 students. To address this, a Rasch analysis-based 
methodology was used to develop a valid and reliable whole number place value paper-and- 
pen assessment, called the Place Value Assessment Tool (PVAT) (see Rogers, 2014). While 
the PVAT provided a detailed picture of student knowledge in the construct, the time taken 
to mark (5-7 minutes per student) was seen as a potential obstacle for teachers. To address 
this, the researcher investigated if a comparable online version of the test could be created. 


Relevant Literature 


Place value knowledge has been compared to the framework of a house, such that if a 
student’s knowledge in this area is shaky, his/her understanding of mathematics as a whole 
is affected (Major, 2011). An understanding of place value has been shown to be closely 
related to students’ sense of number (McIntosh et al., 1992), understanding of decimals 
(Moloney & Stacey, 1997), and comprehension of multi-digit operations (Fuson, 1990). 
Underpinning almost every aspect of the mathematics curriculum, it is an integral part of the 
primary school syllabus. Yet there is considerable evidence to suggest students struggle with 
whole number place value well into lower secondary school (Thomas, 2004; Wade et al., 
2013). Research has shown that place value is often taught superficially, something that can 
be attributed to the lack of quality formative assessments available in this construct (Major, 
2011; Rogers, 2014). 

An assessment is essentially a sample of selected tasks intended to allow inferences to 
be made about a student’s level of achievement. The strength of these inferences relies 
heavily on the quality of the tasks used (Izard, 2002). An assessment which includes a 
selection of items that are too easy, or too difficult, will not provide teachers with a complete 
picture of each student’s knowledge. Similarly, an assessment that does not comprehensively 
cover the required content may cause the omitted content to be devalued by teachers (Webb, 
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2007). In both cases, the inaccurate inferences drawn from these assessments, adversely 
influence the quality of instruction. Formative assessment is a process that provides teachers 
with information that can be used to support individual student’s future learning (Popham, 
2018). It is one of the most effective, empirically proven, processes that teachers can use to 
improve student performance. 

An important consideration when developing assessments is practicality (Masters & 
Forster, 1996). If an assessment instrument does not justify the time or money required for 
its administration and marking, it will not be implemented by schools. Doig (2011) noted 
that some educators (despite appreciating the quality of data they received) avoided using 
interview-based assessments simply because of their administration time. As a result, many 
schools consider paper-and-pen tests a more practical assessment option, particularly with 
older students. Proponents of interview-based assessments disagree, stating clinical 
interviews provide higher quality assessment information and enhance teacher knowledge 
of common misconceptions in mathematics (Clements & Ellerton, 1995). While 
mathematics assessments have traditionally been delivered via paper-and-pen or interview 
(Griffin et al., 2012), the accessibility of technology has seen test developers investigate the 
many opportunities provided by CBA (ACARA, 2021). 

CBA’s major advantage is it delivers traditional assessment in a more efficient and 
effective manner (Bridgeman, 2009). CBA has the potential to save teachers time marking 
test papers and means results can be used to guide instruction in a timelier manner (Tomasik 
et al., 2018). Yet, as Thompson and Weiss (20011) explain, many school’s technological 
capabilities fail the standard required to successfully implement CBA, leading to test 
administration problems (McGowan, 2019). Thus, while CBA has great potential in schools, 
further logistical work is required to ensure its success. 

Much research associated with CBA has explored the comparison of traditional paper- 
and-pen based tests with their CBA equivalent (e.g., Wang et al., 2007; Thompson & Weiss, 
2011). Wang et al. (2007) conducted a meta-analysis of 44 mathematics-based assessments 
comparing paper-and-pen and CBA versions of the same test. Overall, they reported that the 
mode of administration did not have a substantive effect on the students’ performance (ES 
= -0.059). These comparisons aimed to determine whether online and paper versions of the 
same test could be used interchangeably. This is an important practical consideration, as 
comparable tests allow schools the flexibility to choose the most appropriate mode for their 
context. Yet, as Popham (2018) suggests, the decisions around test selection rely heavily on 
the assessment literacy of teachers and school leaders. 

Popham (2018) defines assessment literacy as an “individual’s understanding of the 
fundamental assessment concepts and procedures deemed likely to influence educational 
decisions” (p.13). An assessment literate teacher makes informed choices around the 
assessments they use, and accurately applies the results to guide their instruction. Research 
has shown that assessment literacy is not usually a focus of teacher education, meaning most 
teachers have poor levels (Stiggins, 2006). While providing teachers with assessment 
literacy professional development has been shown to be effective (Xu & Brown, 2016), 
without access to this, teachers are left to develop these skills ‘on the job’. As CBA is a 
relatively new mode of assessment, it is realistic to assume that teachers need support to 
develop their assessment literacy in this mode. As Popham (2008) points out, being provided 
with assessment data is only the beginning of the process — teachers need the assessment 
literacy skills to understand a test’s construction so they can successfully interpret the data. 

One proven method of test construction is Jtem Response Modelling (IRM), which has 
well-established methods for analysis (Wright & Masters, 1982). IRM measures the 
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relationship between student achievement and item difficulty on the same scale (Wright & 
Stone, 1979). IRM has been successfully applied to a variety of test modes and used in large- 
scale assessments through to high-quality classroom-based assessment tools including PAT- 
M (Australian Council for Educational Research, 2012) and the Scaffolding Numeracy in the 
Middle Years (SNMY) assessment (Siemon et al., 2006). A popular IRM model, devised by 
Rasch (1960) is used in this research. Rasch analysis is based around the interplay of 
candidates and items in an assessment. While analysis of assessments traditionally generates 
a score that summarises the number of items correctly answered by students, Rasch considers 
the students who correctly answered each item (Izard, 2004). Rasch examines the extent to 
which the item distinguishes between those who are more and less knowledgeable (Izard et 
al., 2003). That is, the model assumes that less knowledgeable students have lower 
probability of answering a difficult item compared with those who are more knowledgeable 
(Rasch, 1960). Items that are considered not to follow this pattern do not fit the Rasch model 
and are generally removed from a test. This process verifies that the test content is 
meaningful and appropriate so that useful inferences can be made about the knowledge of 
candidates (Izard et al., 2003). Rasch allows different tests to be located on the same scale 
and allows test designers to determine if they are of comparable difficulty. The next section 
describes how quantitative Rasch based methods were used to compare the PVAT and 
PVAT-O, and the qualitative methods used to gather insights from teachers. 


Methodology 
PVAT-O Creation 


Multiple technologies including HyperText Markup Language (HTMLS), Javascript, 
and PHP: Hypertext Preprocessor (PHP) were used to create the PVAT-O assessment. The 
mathematical content and format of each PVAT-O item was as close as possible to the 
equivalent PVAT items. However, some items required the inclusion of computer-based 
features. For example, a ‘drag and drop’ feature was used in items requiring students to place 
numbers in order from smallest to largest and ‘radio buttons’ were used in multiple choice 
items. 


The Counterbalanced Trial 


The online and paper and pen PVAT trial was conducted at School C, a Catholic Primary 
school in metropolitan Melbourne where approximately 11% of students were from English 
as an Additional Language or Dialect (EAL/D) families (ACARA, 2020). All Year 3 to 6 
students (N = 253) from nine classes took part in the trial (Male= 47%, Female= 53%). The 
trial took place over a 2-week period in the school library and was supervised by both the 
researcher and the classroom teacher. The trial was conducted using a counterbalanced 
measures design (Shuttleworth, 2009). Half of the students in each class (randomly selected) 
completed the PVAT-O, whilst the other half of the class completed the paper-and-pen 
PVAT. Exactly one week later, the students completed the alternate version of the test. This 
research design was used to minimise factors such as learning effects and order of treatment, 
adversely influencing the results of the trials (Perlini et al., 1998). Only 227 students (Male= 
45%, Female= 55%) completed both forms, due to absences and technical issues. 


Teacher Surveys 


A short survey was given to the nine Year 3 to 6 classroom teachers (Female=100%). 
The purpose of this survey was to gain an indication of the teacher’s preferred testing mode. 
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When the survey occurred, the teachers had not yet received their student’s results from the 
PVAT-O database, but it was explained they would receive each student’s raw score in a 
spreadsheet. Due to the small sample size, the survey data was interpreted by the researcher 
and reported as individual responses (Neuman, 2006). 


Rasch Analysis 


The paper PVAT tests were scored and coded by the researcher. The PVAT-O was 
scored by the PVAT-O database and then rechecked by the researcher to ensure consistency 
and accuracy. In order to determine if the PVAT and PVAT-O could be considered valid 
tests and comparable in their mean item difficulty and mean student achievement (Kolen & 
Brennan, 2004), three Rasch analyses were conducted: 

e Run A was conducted to re-confirm that the paper-and-pen PVAT was a valid and 
reliable test. The items which fit the model were used to create an anchor file for Run 

C. This allowed the PVAT and PVAT-O items to be placed on the same scale. 

e Run B looked at the PVAT-O items in isolation. Rasch analysis was used to determine 
which PVAT-O items fit the model and determine if it was an internally consistent test. 
e Run C investigated if the PVAT and PVAT-O could be placed on the same uni- 
dimensional scale and thus determine if they were comparable in item difficulty and 
student achievement. 
The anchor file from Run A was used to fix the difficulty estimates of the PVAT items that 
fit the model. This allowed the PVAT-O items to be calibrated against the PVAT items 
(Izard, 2005). The mean item difficulty and mean student achievement for the PVAT and 
PVAT-O was then calculated from this run. Effect Size measures were used to quantify the 
standardised mean difference between the two tests (Izard, 2004). Cohen’s (1969) 
descriptors for the magnitude of Effect Sizes, alongside the assigned ranges for each 
descriptor as suggested by Izard (2004), were then be used to describe the Effect Sizes in 
plain language. 


Results 
Rasch Validation and Comparison 


The mean and standard deviation of the PVAT (n = 65) and PVAT-O (n = 59) items 
which fit the model in Run C were calculated to determine if the PVAT and PVAT-O could 
be considered comparable tests. The Effect Size measure was calculated to be 0.14, while 
the difference in student achievement between the tests was 0.01. This is described to be a 
“very small (0.00 to 0.14)” (Izard, 2004, p. 8) magnitude of Effect Size. This suggests there 
was not a substantive difference between the mean of item difficulties in the two modes, nor 
the students’ achievement (which is to be expected, given the tests were of similar difficulty). 


Teacher Survey 


The class teachers (N = 9) at School C completed a brief survey asking them to indicate 
their preferred mode of administration for the PVAT. Seven teachers preferred the PVAT- 
O, while two preferred the PVAT. The seven teachers who preferred the PVAT-O stated: 

‘It will save correcting it’ (Teacher #1 ,#2,#3) 

‘The results are immediate, I can use them the next day in my teaching’ (Teacher #4) 

‘If the computers all work, online is much better’ (Teacher #5) 

‘I don’t have to correct it...and I can use the results tomorrow’ (Teacher #6) 

‘The corrections would save me a lot of time and effort’ (Teacher #7) 
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The two teachers who indicated they preferred the PVAT mode stated: 
‘Correcting them myself gives me a sense of their understanding’ (Teacher #8) 
‘I’m always concerned students will lose their responses’ (Teacher #9) 


The small sample of teachers completing this survey limits the inferences that can be made 
from the data. However, within this group of teachers there was a clear preference for the 
PVAT-O mode of test administration, largely due to marking time it saved. 


Discussion 


Formative mathematics CBA continues to be embraced by schools, teachers and test 
developers. This research highlights several considerations when implementing formative 
CBA in classrooms: transparency, rigor, flexibility, and assessment literacy. 


Transparency 


While teachers in this research project were provided access to both the paper and CBA 
version of the test, this is not always the case. For example, in Computer Adaptive Tests 
(CAT) (Martin & Lazendic, 2018) each child is provided with a different set of items 
according to their responses. It is impossible for a teacher to view the combination of items 
individual students encounter, thus they are unable to judge their quality, appropriateness 
and relevance. Without this transparency, teachers are outsourcing the judgement of student 
knowledge to test designers. While somewhat appropriate in summative situations, 
eliminating teacher judgement in the formative assessment process should raise concerns for 
schools. Teacher #8 at School C echoed this ‘transparency’ constraint, indicating she was 
concerned about missing important diagnostic information in the PVAT-O. In response to 
this, the database was later adjusted to ensure teachers were provided with a summary of 
student responses to each item. The Specific Mathematics Assessments that Reveal Thinking 
(SMART) tests (University of Melbourne, 2012), are another platform that recognises the 
importance of allowing teachers to ‘see’ common student errors in the CBA mode. Doig 
(2011) reiterates this concern, noting that ‘off site marking’ does little to assist teachers to 
develop their knowledge of common student errors and misconceptions. Providing teachers 
with an overall raw score, rather than access to individual responses, is a major constraint of 
formative CBA and an issue which needs to be addressed by test designers. 


Rigor of the Assessment 


Wiliam (2007) states that formative assessment can effectively double the speed of 
student learning. Yet, as often happens in education, approaches can become diluted when 
commercial firms become involved. In order for schools and teachers to make informed 
decisions about the worth of formative CBA programs (particularly those produced 
commercially), it is critical teachers understand how to evaluate the rigor of a test’s 
construction. The results presented in this paper use Rasch analysis to show both the PVAT 
and the PVAT-O are valid and reliable tests. For schools, this is essential information as it 
means the test has been empirically proven and robustly constructed. While the relatively 
small sample size gathered from only one school limits the scope of conclusions that can be 
made from this trial, very little difference was detected between the mean difficulties and 
student achievement of test items. Similarly, the student achievement was found to be 
comparable. This supports the results of the meta-analysis conducted by Wang et al. (2007), 
which noted that the mode of administration did not have a substantive effect on student 
achievement in computer-based and paper-based mathematics assessments. As Popham 
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(2018) suggests, schools should be encouraged to contact test developers, ask for a test’s 
technical guide, and gather information related to the trialing, reliability and validity so they 
can make informed decisions about the suitability and rigor of tests. 


Flexibility 


Providing teachers with access to a comprehensive formative place value assessment that 
can be administered in two modes is considered to increase the usability and practicality of 
the PVAT tool. The PVAT-O was designed to support teachers by providing instant 
feedback on their students’ achievement and save them considerable time. As the online and 
paper PVAT tests were found to be comparable, teachers are now able to choose the mode 
which works best for them and their students. This flexibility is useful, as not all schools 
have the technological requirements to successfully implement CBA. As Csapo et al. (2012) 
note, at a minimum, a school requires the capacity to allow students completing the 
assessment concurrent access to the Internet while still supporting the Internet requirements 
of the rest of the school. As Huff and Sireci (2001) correctly note, when this does not occur, 
the validity of the test is threatened. In the PVAT-O trial it was noted that some computers 
took a great deal longer than others to move through the PVAT-O. This frustrated and 
disadvantaged the students working on the ‘slow’ computers. Teachers #5 and #9 both 
mentioned their concerns with the fragility of the technology at their school, stating “if the 
computers all work...”(Teacher #5) and “I’m always concerned students will lose their 
work” (Teacher #9). Providing teachers with a ‘back up’ paper version of the test is 
considered a practical way to alleviate these fears. 


Assessment Literacy 


Popham (2018) explains that educators who are not assessment literate often make 
inappropriate decisions about which tests to use. Using formative CBA is a relatively new 
form of mathematics assessment in schools, so it is critical teachers are helped to understand 
the affordances and constraints of these tools. Teachers are a critical stakeholder in the 
formative CBA process. They are required to administer the assessment and _ their 
interpretation of the results influences its success (Jones & Truran, 2011). Seven of the nine 
teachers in this research described how they based their mode preference choice solely on 
the time it would save. Research by Melleti and Khademi, (2018) showed that for both 
assessment literate and illiterate teachers, time was their main concern when implementing 
formative assessment. Yet interestingly, assessment literate teachers considered the time 
they spent creating and marking assessments a necessary part of the process. Thus, it appears 
that when teachers do not fully appreciate the advantages of formative assessment, they 
consider the time spent on it untenable. This reinforces the need to develop teacher’s 
assessment literacy skills around formative assessment, particularly in CBA (Popham, 
2018). Without appropriate professional development designed to increase assessment 
literacy, teachers will continue to focus on selecting assessments based on their perceived 
ease of administration and marking, rather than the quality of the tool. 


Conclusion 


The demands on a classroom teacher’s time have never been greater. Whilst a major 
affordance of CBA is the time it saves teachers, one of the major constraints is its lack of 
transparency. When a computer database marks student responses, a teacher’s judgment and 
involvement in the process is removed. This research suggests in order to retain the fidelity 
of the formative assessment process, teachers require access to professional development 
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that aims to grow their assessment literacy skills. Developing these skills will encourage 
teachers to seek quality empirically proven assessments, and assist them to accurately 
interpret CBA data. 
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